Thermodynamically Important Contacts in Folding 
of Model Proteins 
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We introduce a quantity, the entropic susceptibility, that 
measures the thermodynamic importance — for the folding 
transition — of the contacts between amino acids in model pro- 
teins. Using this quantity, we find that only one equilibrium 
run of a computer simulation of a model protein is sufficient 
to select a subset of contacts that give rise to the peak in the 
specific heat observed at the folding transition. To illustrate 
the method, we identify thermodynamically important con- 
tacts in a model 46-mer. We show that only about 50% of 
all contacts present in the protein native state are responsible 
for the sharp peak in the specific heat at the folding transi- 
tion temperature, while the remaining 50% of contacts do not 
affect the specific heat. 



Proteins are heteropolymers, composed of 20 types of 
amino acids, that perform specific functions. The amino 
acid composition of proteins determines their unique 
structure, function, and folding kinetics. Understanding 
the relevance of the interactions between amino acids to 
protein folding is a complex task that has been the sub- 
ject of a number of theoretical and experimental studies 
jl] [l3| |. The transition from the unfolded to the folded 
state of a protein is accompanied by a drastic reduction 
of the entropy. In one popular scenario, the folding tran- 
sition for short proteins is analogous to the nucleation 
process at a first order transition P,p|,p|,p^[ , with compe- 
tition between two free energy minima: the folded state 
with low energy and entropy and the unfolded state with 
high energy and entropy. These two minima are sepa- 
rated by a free energy barrier corresponding to the tran- 
sition states. 

At the folding transition temperature, Tf, there is an 
abrupt change in the energy of the system resulting in 
a pronounced peak in the specific heat. At Tf, a small 
increase in interaction energy between amino acids 
i and j ( "contact strength" ) results in rapid transition 
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to the folding state, while a small decrease in contact 
strength results in transition to the unfolded state. How- 
ever, different amino acids have a different contribution 
to the folding transition. Small variation in for dif- 
ferent pairs i and j has a different effect on the folding 
transition. Here, we study the thermodynamic impor- 
tance of each interaction during folding by computing 
the entropic susceptibility — the response function to a 
small perturbation of etj . 

We assume that the protein potential energy is additive 
in the pair potentials (contacts) 



U 



(1) 



where Uij is the energy of a single pair, (j){fi,Tj) models 
the shape of the potential and protein at positions f*j and 
fj . We define the entropic susceptibility, Xij j of a contact 
between amino acids i and j as 

Xij = <u— = P 2 ((U Ua) - (U)(U l3 )) = P 2 (SU SUij) , 



SUij = Uij 



(2) 

(Uj), and (. . .) is the 



where 5U = U-(U), 
Boltzmann average |l4| ] 

The entropic susceptibility measures the effect of a 
contact strength perturbation on the folding transition 
of the protein, thus identifying the thermodynamic rele- 
vance of such contact for the folding transition. Next, we 
demonstrate how this measure can be used to study con- 
tributions of the various contacts between amino acids in 
the protein for the folding transition. We simulate the 
"beads on a string" protein model B ] , where the amino 
acids are hard spheres of unit mass, with the centers at 
the positions of the corresponding a-carbons. The poten- 
tials of interaction between amino acids are square wells 
of depth eij . We study the 46-mer (the folding transition 
temperature is at Tf « 1.44) that has been examined in 
ifpjf . We use Go model for the contact potential, U^: 
en = —1) if the contact exists in the 



Uij is attractive 



native (ground) state, otherwise the contact potential is 



repulsive (q 



-1) |15|,|16|. Our simulations employ the 



discrete MD algorithm and are performed using methods 
described in |To|Jr^ , p^| . The matrix of native contacts of 
the 46-mer is shown in Fig. [l|. This particular 46-mer is 
known to have a stable native state and to undergo first- 
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order-like folding unfolding transitions without stable 
intermediates [ p^| . 

We calculate Xij at different temperatures below and 
above Tf. A histogram of the values of Xij f° r various T 
is shown in Fig. |^. For T « Tf the distribution has a pro- 
nounced peak at large values of Xij ; which indicates that 
there is a separation of all contacts in two distinct sets 
with large and small values of Xij- The set of contacts 
with large values of Xij are "thermodynamically impor- 
tant contacts," since for these contacts a small variation 
in their strength is correlated with a drastic change in the 
entropy of the model protein. To select the thermody- 
namically important contacts, we define a temperature- 
dependent threshold Xth(T) corresponding to the value 
of Xij where the distribution has a maximum in the space 
of all contacts. 

Interestingly, thermodynamically important contacts 
are not randomly distributed in 3d space but are rather 
concentrated within well-defined structural regions in a 
model protein. Figure [| represents the intensity map 
of the values Xij- I n the upper part of Fig. || we 
show only the values of Xij that are above the thresh- 
old xth(Tf) = 3.2 which, according to our definition, cor- 
responds to the thermodynamically important contacts. 
Although 50% of the contacts are above threshold, the fil- 
tered map of Fig. || shows that they are clustered together 
and are among well-defined regions of the model protein. 
Further, we find that the regions of thermodynamically 
important interactions (Xij{T) > Xth(T)) in the filtered 
map remain qualitatively the same as the ones shown in 
Fig. H for temperatures in the range T = Tf ± 5%. 

To verify that the thermodynamically important con- 
tacts are indeed thermodynamically the most relevant to 
the folding of our 46-mer, we measure the contribution 
of thermodynamically important contacts to the specific 
heat 



Xij 



(3) 



Thus, we can interpret Xij as the contribution to CV of 
a single contact. It is then possible to partition CV as 



Cv — C 



TIC 



others 



(4) 



where Cy IC arises from the thermodynamically impor- 
tant contacts, and Cy- thcrs from contacts below the 
threshold xth(T). Fig.[| shows that the thermodynami- 
cally important contacts give a sharp contribution to the 
specific heat around Tf. We find the number of contacts 
above threshold Xth(Tf) is about 50% of the number of 
contacts in the native state, in agreement with Flory-type 
arguments JTo| ]. 

It is natural to inquire whether the thermodynamically 
important contacts could be determined by analyzing the 
average contact energies (Uij), which are related to the 
contact frequency map pTj. For square well potentials, 



(Uij) = tij(fij) where /y is the contact frequency for 
amino acids i and j. We find that the contacts with the 
largest values of are nearest and next nearest neigh- 
bors. Thus, in order to account for the long-range con- 
tacts we have to go beyond the estimation of frequencies. 
An alternative way of computing the cntropic suscepti- 
bility is to note that the contact frequencies are related 
to the change in free energy F 



OF 



(5) 



and therefore the entropic susceptibility can be rewritten 
as 



_d_ 

£lJ de~ 



ij 



dF 
dT 



dp(U, 



dT 



= ^2 Pij) 



d(Ujj) 1 
dT T' 

(6) 



Thus, the information about the thermodynamically im- 
portant contacts can be inferred from the temperature 
derivative of the frequency map. 

"Core contacts" were defined in as those that form 
most stable elements of the protein three-dimensional 
structure that remains intact at folding transition tem- 
perature. Specifically, they were defined as contacts that 
are present with frequency above 0.5 at Tf. Molecular 
dynamics simulations performed at T — Tf (see Fig. ||) 
show that these contacts are mostly short-range. (The 
range for the contact between residues i and j is defined 
as \i — j\). This result is not surprising since local con- 
tacts can form with high probability even in the unfolded 
state at T « Tf. 

In contrast, we find that the thermodynamically im- 
portant contacts are mostly long-range, for which \i — 
j\ 3> 1 (see Fig. ||). According to our definition, the ther- 
modynamically important contacts correlate with the po- 
tential energy, thus, they are likely to be present in the 
folded state with the low potential energy and are likely 
to be absent in the unfolded states with high potential 
energy. Therefore, we believe that they are important 
for stabilization of the native structure. This hypothesis 
is in agreement with the general observation Jl^Jfjl that 
long-range interactions are important for protein stabi- 
lization. 

We also find that the set of the thermodynami- 
cally important contacts contain all five nucleic contacts 
((11,39) J10, 40), (11,40), (10,41), and (11,41)) discov- 
ered in fll2| , indicating the dual role some amino acids 
play in protein folding: the nucleic residues, which play 
crucial role in the kinetics of folding transition, may 
also be important for stabilizing proteins in their native 
state. The evidence for the existence of such residues is 
supported by evolutionary Jll],|2(| and phenomenological 
studies fzjfi- 

In conclusion, we demonstrate that by calculating the 
cross correlations between the potential energy of a single 
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contact and the total potential energy of a model protein, 
it is possible to identify the set of contacts that are ther- 
modynamically most relevant to the folding process. The 
tool of identifying thermodynamically important con- 
tacts is simple and can be implemented in the molecular 
dynamics studies of model proteins. The computational 
effort can be directed to aid experimental studies of real 
proteins. 
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FIG. 1. Contact map of the native state of the 46-mer: 
dark squares denote residues that have contacts in the native 
state. Interactions are assigned according to Go model || : all 
pairs of residues that have a contact in the native state are as- 



signed attractive potential (ti 



T), while remaining pairs 



of residues are assigned to a repulsive potential (eij = +1). 
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FIG. 3. The role of the thermodynamically important con- 
tacts: the lower corner is the intensity map of the entropic sus- 
ceptibility, Xih obtained from the simulations of the 46-mer 
at Tf. Darker colors correspond to higher values of Xij- 
The upper corner is the "filtered" map, where only values 
of Xij above the threshold Xth — 3.2 defined in Fig. ^| are 
presented. Note that short ranged contacts (i ps j, corre- 
sponding to near-to-the-diagonal elements of the matrix Xij) 
do not contribute significantly to the change of entropy at 
the folding transition, while the relevant long-ranged contacts 
(I* — J 1) are clustered in the islands in the filtered map. 
Specifically, nucleic contacts determined in belong to the 
cluster in the top left corner with i ~ 10 and j « 40. 
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FIG. 2. Histogram of the entropic susceptibility, Xiji f° r 
the 46-mer at temperatures T = 1.31, 1.44, 1.55. At Tf 
the distribution of Xij nas a pronounced peak, centered at ^ 
Xij = 3.2. Accordingly, we choose Xth — 3.2. O 



400 




200 





Total 






„ other 

— c v 






r TIC 








• — \ 




w 

1 1 1 1 



1.0 1.2 1.4 1.6 

Temperature, T 



1.8 



2.0 



FIG. 4. Specific heat as a function of temperature of the 
46-mer (solid line). There is a pronounced peak in the distri- 
bution of Xij in the region 1.31 < T < 1.55. We separate the 
contribution to the specific heat from the thermodynamically 
important contacts C^ IC (squares) and the remaining ones 
pother ( c jj. c i es )_ The number of contacts above threshold is 
always found to be approximatively 50% of the number of 
native contacts. 
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FIG. 5. The map of core contacts of the 46-mer: the dark 
squares denote contacts between any two residues i and j with 
frequencies fij > 0.5. The core is comprised mostly of local 
contacts. 
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